Thank you all for your useful discussions and feedback. This is simply a
brief summary of the items which I see as most important, together with a
note on what I propose to do next.

  Mike

*****************************************************************************

1) Directory structure

At present, directories are scattered across the disc, with the objective of
keeping each directory close to the files that it contains.

Jonathan has suggested instead that all directory information is co-located.

Advantages:

  - No disc seeks when following long path names
  - More efficient use of space on disc (separate directories => wasted space
     at the end of each one)

Disadvantages:

  - Whenever it's necessary to access directory information, a disc seek is
     likely to occur (for example, suppose we have just loaded file $.x.c
     and now wish to list the contents of directory $.x.d)
  - More complex allocation scheme required to try to keep all directory
     data together and separate from files


1A) Btrees

If (1) is worthwhile, then Jonathan has suggested a Btree representation.

Advantages:

  - Guaranteed performance in both space and time domains which is close to
     the "typical" performance
  - Known technology

Disadvantages:

  - "Small" changes (from a user's viewpoint) may result in significant
     reorganisations of the Btree - for example, renaming a top level
     directory.
  - Entries within one directory are not necessarily next to each other -
     and may be a long way apart - so directory listings may take longer.

Possibilities:

  - Could hold the contents of very small files inside the Btree nodes
     themselves.


2) Alternative scheme to allow limited movement between chunks

John's proposal has three aspects:

  a) Different access mechanism for shared objects
  b) Concept of a "chunk group"
  c) Use of "indirect blocks" for large files

After discussion, it seems that (a) is mostly an implementation detail and
that it is (b) and (c) that are important.


2A) Large file allocation unit smaller than shared chunk size

The idea of a "chunk group" is to allow small objects to move within an area
that is larger than the size of a large file allocation unit. This creates
more opportunities to free up chunks for large files when a number of
adjacent shared chunks are sparsely used.

There are a number of trade-offs here - clearly the most flexible scheme is
where the chunk group is the whole disc, but this also results in a very
large chunk group table!


2B) Immutable object id's

John would like to have an identifier for a file that is guaranteed never to
change. If it is possible to design a satisfactory allocation strategy based
on (2A)'s approach to disc structures, then this can be achieved if large
files are always indirected through a small file.


3) Concern that the allocation strategy will not work

I think we all share this concern!

FileCore's current strategy is proven, but we do not know if it can be
adapted to work effectively with larger discs. On the other hand, we do
know that some changes will be necessary - for example, space utilisation is
not as good as it might be for discs of the 100M order. The only way forward
here appears to be examination of the code in order to deduce the allocation
strategy employed.

My current proposal - such as it is - is likely to suffer in two ways:

  - When the disc becomes full (and allocation is difficult), files will no
    longer be able to be stored "close" to their parents. When space again
    becomes available, they will remain "far away" - and will also block
    other attempts to localise.

  - A situation may arise where there is plenty of space available in part-
    filled shared chunks, but no unallocated chunks; it is then not possible
    to allocate space for any more large files.


4) Concern that we're reinventing the wheel again

Old habits die hard, of course ...

John has pointed me to a couple of articles about variants of the Unix FFS
from which some further good ideas have come. However, these filing systems
tend to rely on large numbers of buffers, and often have large numbers of
parameters that a system administrator is expected to tweak: in particular,
discs are always partitioned into separate areas for inodes, files, and - in
one case - directories.

*****************************************************************************

What next?

Somewhat perversely, I propose to spend another couple of days trying to
integrate John's ideas with my "fixed size chunk" proposal, and see if I can
arrive at a more credible allocation strategy.

If this fails, I guess I'll have to knuckle down and look at all that
FileCore code ...

Next on the agenda is to consider the implications of a separate directory
structure and the use of Btrees.

So far I've received no pointers to documents/papers about the disc
structures and/or allocation strategies used by filing systems other than
Unix. Locating such information is often a time-consuming process involving
library searches through periodicals, and telephone calls to manufacturers.
There may be a gold-mine just around the corner, but I'm not very optimistic:
after all, what would Acorn's response be to a request for information on
our FileCore allocation strategy?

I don't at present plan to invest time on such a search, but will review any
information I find on an opportunistic basis: please let me know if any of
you think differently.

*****************************************************************************
